This report includes a description and the results of the analyses requested by Elisa Jentho at 08/06/2021:
1 - Do functional enrichment analysis using only the functional database GO:BP.
2 - Re-do heatmaps of differential expression genes using log2 fold change values.
3 - Plot additional markers in the integrated overlay UMAPs.
The R version used was 3.6.3 (R Core Team 2020). This report was built with the rmarkdown R package (v.2.5) (Allaire et al. 2020; Xie, Allaire, and Grolemund 2018; Xie, Dervieux, and Riederer 2020). See the full list of packages and versions used at the end of this report - R packages used and respective versions.
The integrated 10x Lox2 and Cre3 samples processed previously were imported to plot the markers upon the UMAPs with the Seurat R package (v.4.0.0) (Satija et al. 2015; Butler et al. 2018; Stuart et al. 2019; Hao et al. 2020).
The figures and tables displayed through the report can be download in pdf and tab-separated format by clicking on the bottom left Download plot and Download table button that appears after each plot and table, respectively.
## Set seed
set.seed(seed = 1024) # to keep reproducibility
## Import packages
library("dplyr", quietly = TRUE)
library("DT", quietly = TRUE) # package to print nice in the html report
library("Seurat", quietly = TRUE)
library("gplots", quietly = TRUE)
library("Matrix", quietly = TRUE)
library("ggplot2", quietly = TRUE)
library("tidyr", quietly = TRUE)
library("readxl", quietly = TRUE)
library("gprofiler2", quietly = TRUE)
library("ComplexHeatmap", quietly = TRUE)
The Cre3 and Lox2 integrated samples were imported.
## Import Seurat object with the re3 & Lox2 integrated samples
# import obj integrated
seu <- readRDS(file = "../results/int_28_05_21/R_objects/seu.rds")
It was also imported the differential gene expression tables obtained before.
## Import DGE tables
dge_tables <- "../results/int_28_05_21/tables/dge_tables"
dge2import <- list.files(path = dge_tables, pattern = "clt_", full.names = TRUE)
names(dge2import) <- lapply(basename(dge2import), function(x) strsplit(x = x, split = "_")[[1]][1:2] %>%
unlist(.) %>% paste(., collapse = "_")) %>% unlist(.)
dge <- lapply(dge2import, function(x) read.table(file = x, header = TRUE, sep = "\t",
stringsAsFactors = FALSE))
It was also imported the markers to plot in the overlay integrated UMAPs as well as the markers for cluster 8 from the integrated Cre3 sample alone ("PCHAS-130590", "PCHAS-130630", "PCHAS-130770", "PCHAS-131600", "PCHAS-134500").
## Plot overlay markers for cluster 8
markers_clt8 <- c("PCHAS-130590", "PCHAS-130630", "PCHAS-130770", "PCHAS-131600", "PCHAS-134500")
# import new markers
markers2import <- "../data/markers_to_plot_08_06_21/MArkers.xlsx"
markers_all <- read_excel(path = markers2import, col_names = FALSE)
colnames(markers_all) <- "Genes"
The new markers to plot are highlighted below.
In order to perform a functional enrichment analysis between the differentially expressed up- and down-regulated genes obtained from each pairwise comparison between each cluster across Cre3 vs Lox2, it was used the gprofiler2 R package (v.0.2.0) (Kolberg and Raudvere 2020), an interface to the g:Profiler web browser tool. The function gost() was applied in order to perform enrichment based on the list of up- or down-regulated genes, between each pairwise comparison (independently), against the annotated genes (domain_scope = “annotated”) of the organism Plasmodium chabaudi (organism = "pchabaudi"). The gene lists were ordered by increasing adjusted p-value (ordered_query = TRUE) in order to generate a GSEA (Gene Set Enrichment Analysis) style p-values. This allows to start the enrichment testing from the top most biological relevant genes with subsequent tests involving larger sets of genes. In addition, only statistically significant (user_threshold = 0.05) enriched functions are returned (significant = TRUE) after multiple testing correction with the default method g:SCS (correction_method = "g_SCS"). Finally evidence codes are added to the final result (evcodes = TRUE). The functional enrichment analysis was performed only against the GO:BP functional database by providing the option sources = "GO:BP".
It was tested functional enrichment for all clusters, with the exception of cluster 8, because it lacks differential gene expression data. It was given individually a list of up- (avg_log2FC > 0) or downregulated (avg_log2FC < 0) genes ranked (from the most significant to the lowest) by adjusted p-values (only genes with an adjusted p-value < 0.05).
The result of this analysis is a plot and a table for each up- and down-regulated gene list from each cluster pairwise comparison across Cre3 vs Lox2, that shows the functional enrichment based on the -log10 of the adjusted p-value across different functional databases (see above). Only significant functions are reported. The table contains the following columns/information (the information below was copied entirely from the original vignette here):
query: the name of the input query which by default is the order of query with the prefix “query_”. This can be changed by using a named list input.
significant: indicator for statistically significant results
p_value: hypergeometric p-value after correction for multiple testing
term_size: number of genes that are annotated to the term
query_size: number of genes that were included in the query. This might be different from the size of the original list if:
any genes were mapped to multiple Ensembl gene IDs
any genes failed to be mapped to Ensembl gene IDs
the parameter ordered_query = TRUE and the optimal cutoff for the term was found before the end of the query
the domain_scope was set to “annotated” or “custom”
intersection_size: the number of genes in the input query that are annotated to the corresponding term
precision: the proportion of genes in the input list that are annotated to the function (defined as intersection_size/query_size)
recall: the proportion of functionally annotated genes that the query recovers (defined as intersection_size/term_size)
term_id: unique term identifier (e.g GO:0005005)
source: the abbreviation of the data source for the term (e.g. GO:BP)
term_name: the short name of the function
effective_domain_size: the total number of genes “in the universe” used for the hypergeometric test
source_order: numeric order for the term within its data source (this is important for drawing the results)
parents: list of term IDs that are hierarchically directly above the term. For non-hierarchical data sources this points to an artificial root node.
In the bubble plots represented below, functions that have a -log10(adjusted p-value)>16 will be capped; i.e., they will appear close to the vertical dashed black line. All these results reported below were run at 09/06/2021 using the archived version of the gprofiler2 server - Ensembl 102, Ensembl Genomes 49 (database built on 2020-12-15): https://biit.cs.ut.ee/gprofiler_archive3/e102_eg49_p15/gost.
## Get up and down gene lists and parse them
reg_gene_list <- list()
for ( clt in names(dge) ) { # select dge by cluster and retrieve name
sub_df <- dge[[clt]]
#sub_df[,"Geneid"] <- rownames(sub_df)
gene_ids_up <- sub_df %>%
filter(p_val_adj < 0.05 & avg_log2FC > 0) %>%
arrange(p_val_adj) %>%
pull(Geneid)
gene_ids_down <- sub_df %>%
filter(p_val_adj < 0.05 & avg_log2FC < 0) %>%
arrange(p_val_adj) %>%
pull(Geneid)
reg_gene_list[[clt]] <- list()
reg_gene_list[[clt]][["up"]] <- gsub(pattern = "PCHAS-", replacement = "PCHAS_", x = gene_ids_up)
reg_gene_list[[clt]][["down"]] <- gsub(pattern = "PCHAS-", replacement = "PCHAS_", x = gene_ids_down)
}
# create folders.
r_object_folder <- "../results/func_enrich_plots_09_06_21/R_objects"
if ( ! dir.exists(r_object_folder) ) dir.create(r_object_folder, recursive = TRUE)
func_enrich_folder <- "../results/func_enrich_plots_09_06_21/tables/functional_enrichment"
if ( ! dir.exists(func_enrich_folder) ) dir.create(func_enrich_folder, recursive = TRUE)
### Functional enrichment of DEG
## Run gprofiler2
func_enrich <- list()
set_base_url("https://biit.cs.ut.ee/gprofiler_archive3/e102_eg49_p15")
for ( clt in names(reg_gene_list) ){ # loop over list and do functional enrichment
#print(get_base_url())
func_enrich[[clt]] <- list()
set.seed(1024)
func_enrich[[clt]][["up"]] <- gost(query = reg_gene_list[[clt]][["up"]],
organism = "pchabaudi", ordered_query = TRUE,
multi_query = FALSE, significant = TRUE, exclude_iea = FALSE,
measure_underrepresentation = FALSE, evcodes = TRUE,
user_threshold = 0.05, correction_method = "g_SCS",
domain_scope = "annotated", custom_bg = NULL,
numeric_ns = "", sources = "GO:BP")
if ( ! is.null(func_enrich[[clt]][["up"]]$result) ) {
sub_df_up <- func_enrich[[clt]][["up"]]$result %>%
apply(X = ., MARGIN = 2, FUN = function(x) as.character(x))
write.table(x = sub_df_up,
file = paste(func_enrich_folder, paste0(clt, "_up_functional_enrichment_table.tsv"),
sep = "/"),
quote = FALSE, sep = "\t", row.names = FALSE, col.names = TRUE)
}
set.seed(1024)
func_enrich[[clt]][["down"]] <- gost(query = reg_gene_list[[clt]][["down"]],
organism = "pchabaudi", ordered_query = TRUE,
multi_query = FALSE, significant = TRUE, exclude_iea = FALSE,
measure_underrepresentation = FALSE, evcodes = TRUE,
user_threshold = 0.05, correction_method = "g_SCS",
domain_scope = "annotated", custom_bg = NULL,
numeric_ns = "", sources = "GO:BP")
if ( ! is.null(func_enrich[[clt]][["down"]]$result) ) {
sub_df_down <- func_enrich[[clt]][["down"]]$result %>%
apply(X = ., MARGIN = 2, FUN = function(x) as.character(x))
write.table(x = sub_df_down,
file = paste(func_enrich_folder, paste0(clt, "_down_functional_enrichment_table.tsv"),
sep = "/"),
quote = FALSE, sep = "\t", row.names = FALSE, col.names = TRUE)
}
}
# Create R object
saveRDS(object = func_enrich, file = paste(r_object_folder, "func_enrich.rds", sep = "/"))
# Import functional enrichment object to avoid to have to run the API app
func_enrich <- readRDS(file = paste(r_object_folder, "func_enrich.rds", sep = "/"))
Among the clusters tested, it was only found significant functional enrichment pathways for cluster 3 (down), 4 (up) and 7 (up).
## Functional enrichments - barplots
# Create folder to save results
func_enrich_barplots_folder <- "../results/func_enrich_plots_09_06_21/plots/func_enrich_barplots"
if ( ! dir.exists(func_enrich_barplots_folder) ) dir.create(func_enrich_barplots_folder, recursive = TRUE)
# Plot and save data
func_enrich_list <- list()
for ( clt in names(func_enrich) ) { # loop over cluster
if ( ! is.null(names(func_enrich[[clt]])) ) { # if cluster is not NULL
func_enrich_list[[clt]] <- list()
for ( reg in names(func_enrich[[clt]]) ) { # loop over regulation: 'up' and 'down'
if ( reg == "up" ) {
color <- "#E64B35FF"
} else {
color <- "#4DBBD5FF"
}
# Plot barplot of functional enrichment
func_enrich_list[[clt]][[reg]] <- func_enrich[[clt]][[reg]]$result %>%
arrange(source, p_value) %>%
mutate(term_name = factor(term_name, levels = rev(unique(term_name)))) %>%
ggplot(data = ., mapping = aes(x = term_name, y = -log10(p_value))) + #, fill = source)) +
geom_bar(stat = "identity", fill = color) +
#ggsci::scale_fill_npg(name = "Source") +
#facet_grid( source ~ . , scales = "free_y", space = "free") +
coord_flip() +
theme_bw() +
ylab("-log10 (adjusted p-value)") +
xlab("Term name") +
theme(axis.text = element_text(size = 10, color = "black"))
# save
ggsave(filename = paste(func_enrich_barplots_folder, paste0(clt, "_", reg, "_func_enrich_barplot.pdf"), sep = "/"),
plot = func_enrich_list[[clt]][[reg]], width = 6, height = 4)
}
}
}
print(func_enrich_list$clt_3$down)
print(func_enrich_list$clt_4$up)
print(func_enrich_list$clt_7$up)
A gene was considered differentially expressed (and plotted) if it has an absolute log2FC>0 and an adjusted p-value<0.05. It was used the R package ComplexHeatmap (v.2.2.0) (Gu, Eils, and Schlesner 2016). The same color gradient scale ranging from -4 to 4 was giving to all the heatmaps in order to make them comparable. For the heatmaps with more genes differentially expressed, only the top 10 up- and downregulated genes were labeled.
# path to save plots
dge_folder <- "../results/func_enrich_plots_09_06_21/plots/dge"
if ( ! dir.exists(dge_folder) ) dir.create(dge_folder)
## DGE - heatmaps - clt 0
# data
cluster <- "clt_0"
sub_df <- dge[[cluster]] %>%
filter(p_val_adj<0.05 & abs(avg_log2FC)>0) %>%
arrange(desc(avg_log2FC))
row.names(sub_df) <- sub_df$Geneid
sub_df <- sub_df[,"avg_log2FC", drop = FALSE]
row_annot <- rowAnnotation(foo = anno_mark(at = c(1:2,45:54),
labels = row.names(sub_df)[c(1:2,45:54)],
labels_gp = gpar(fontsize = 8.5)))
col_fun <- circlize::colorRamp2(c(-4, 0, 4), c("blue", "white", "red"))
# save
set.seed(1024)
heat_clt0 <- Heatmap(matrix = as.matrix(sub_df), name = "Average\nlog2FC",
cluster_columns = FALSE, cluster_rows = FALSE,
show_column_names = FALSE, col = col_fun,
show_row_names = FALSE,
right_annotation = row_annot)
set.seed(1024)
pdf(file = paste(dge_folder, paste0(cluster, "_dge_heatmap.pdf"), sep = "/"),
width = 3, height = 0.1 * nrow(sub_df))
print(heat_clt0)
dev.off()
## png
## 2
# print
set.seed(1024)
print(heat_clt0)
## DGE - heatmaps - clt 1
# data
cluster <- "clt_1"
sub_df <- dge[[cluster]] %>%
filter(p_val_adj<0.05 & abs(avg_log2FC)>0) %>%
arrange(desc(avg_log2FC))
row.names(sub_df) <- sub_df$Geneid
sub_df <- sub_df[,"avg_log2FC", drop = FALSE]
row_annot <- rowAnnotation(foo = anno_mark(at = c(1:10,44:53),
labels = row.names(sub_df)[c(1:10,44:53)],
labels_gp = gpar(fontsize = 8.5)))
col_fun <- circlize::colorRamp2(c(-4, 0, 4), c("blue", "white", "red"))
# save
set.seed(1024)
heat_clt1 <- Heatmap(matrix = as.matrix(sub_df), name = "Average\nlog2FC",
cluster_columns = FALSE, cluster_rows = FALSE,
show_column_names = FALSE, col = col_fun,
show_row_names = FALSE,
right_annotation = row_annot)
set.seed(1024)
pdf(file = paste(dge_folder, paste0(cluster, "_dge_heatmap.pdf"), sep = "/"),
width = 3, height = 0.1 * nrow(sub_df))
print(heat_clt1)
dev.off()
## png
## 2
# print
set.seed(1024)
print(heat_clt1)
## DGE - heatmaps - clt 2
# data
cluster <- "clt_2"
sub_df <- dge[[cluster]] %>%
filter(p_val_adj<0.05 & abs(avg_log2FC)>0) %>%
arrange(desc(avg_log2FC))
row.names(sub_df) <- sub_df$Geneid
sub_df <- sub_df[,"avg_log2FC", drop = FALSE]
row_annot <- rowAnnotation(foo = anno_mark(at = c(1:9,55:64),
labels = row.names(sub_df)[c(1:9,55:64)],
labels_gp = gpar(fontsize = 8.5)))
col_fun <- circlize::colorRamp2(c(-4, 0, 4), c("blue", "white", "red"))
# save
set.seed(1024)
heat_clt2 <- Heatmap(matrix = as.matrix(sub_df), name = "Average\nlog2FC",
cluster_columns = FALSE, cluster_rows = FALSE,
show_column_names = FALSE, col = col_fun,
show_row_names = FALSE,
right_annotation = row_annot)
set.seed(1024)
pdf(file = paste(dge_folder, paste0(cluster, "_dge_heatmap.pdf"), sep = "/"),
width = 3, height = 0.1 * nrow(sub_df))
print(heat_clt2)
dev.off()
## png
## 2
# print
set.seed(1024)
print(heat_clt2)
## DGE - heatmaps - clt 3
# data
cluster <- "clt_3"
sub_df <- dge[[cluster]] %>%
filter(p_val_adj<0.05 & abs(avg_log2FC)>0) %>%
arrange(desc(avg_log2FC))
row.names(sub_df) <- sub_df$Geneid
sub_df <- sub_df[,"avg_log2FC", drop = FALSE]
row_annot <- rowAnnotation(foo = anno_mark(at = c(1:10,196:205),
labels = row.names(sub_df)[c(1:10,196:205)],
labels_gp = gpar(fontsize = 8.5)))
col_fun <- circlize::colorRamp2(c(-4, 0, 4), c("blue", "white", "red"))
# save
set.seed(1024)
heat_clt3 <- Heatmap(matrix = as.matrix(sub_df), name = "Average\nlog2FC",
cluster_columns = FALSE, cluster_rows = FALSE,
show_column_names = FALSE, col = col_fun,
show_row_names = FALSE,
right_annotation = row_annot)
set.seed(1024)
pdf(file = paste(dge_folder, paste0(cluster, "_dge_heatmap.pdf"), sep = "/"),
width = 3, height = 0.1 * nrow(sub_df))
print(heat_clt3)
dev.off()
## png
## 2
# print
set.seed(1024)
print(heat_clt3)
## DGE - heatmaps - clt 4
# data
cluster <- "clt_4"
sub_df <- dge[[cluster]] %>%
filter(p_val_adj<0.05 & abs(avg_log2FC)>0) %>%
arrange(desc(avg_log2FC))
row.names(sub_df) <- sub_df$Geneid
sub_df <- sub_df[,"avg_log2FC", drop = FALSE]
row_annot <- rowAnnotation(foo = anno_mark(at = c(1:10,84:93),
labels = row.names(sub_df)[c(1:10,84:93)],
labels_gp = gpar(fontsize = 8.5)))
col_fun <- circlize::colorRamp2(c(-4, 0, 4), c("blue", "white", "red"))
# save
set.seed(1024)
heat_clt4 <- Heatmap(matrix = as.matrix(sub_df), name = "Average\nlog2FC",
cluster_columns = FALSE, cluster_rows = FALSE,
show_column_names = FALSE, col = col_fun,
show_row_names = FALSE,
right_annotation = row_annot)
set.seed(1024)
pdf(file = paste(dge_folder, paste0(cluster, "_dge_heatmap.pdf"), sep = "/"),
width = 3, height = 0.1 * nrow(sub_df))
print(heat_clt4)
dev.off()
## png
## 2
# print
set.seed(1024)
print(heat_clt4)
## DGE - heatmaps - clt 5
# data
cluster <- "clt_5"
sub_df <- dge[[cluster]] %>%
filter(p_val_adj<0.05 & abs(avg_log2FC)>0) %>%
arrange(desc(avg_log2FC))
row.names(sub_df) <- sub_df$Geneid
sub_df <- sub_df[,"avg_log2FC", drop = FALSE]
# row_annot <- rowAnnotation(foo = anno_mark(at = c(1:6,9:18),
# labels = row.names(sub_df)[c(1:6,9:18)],
# labels_gp = gpar(fontsize = 8.5)))
col_fun <- circlize::colorRamp2(c(-4, 0, 4), c("blue", "white", "red"))
# save
set.seed(1024)
heat_clt5 <- Heatmap(matrix = as.matrix(sub_df), name = "Average\nlog2FC",
cluster_columns = FALSE, cluster_rows = FALSE,
show_column_names = FALSE, col = col_fun,
show_row_names = TRUE, row_names_gp = gpar(fontsize = 6.5))#,
#right_annotation = row_annot)
set.seed(1024)
pdf(file = paste(dge_folder, paste0(cluster, "_dge_heatmap.pdf"), sep = "/"),
width = 3, height = 0.1 * nrow(sub_df))
print(heat_clt5)
dev.off()
## png
## 2
# print
set.seed(1024)
print(heat_clt5)
## DGE - heatmaps - clt 6
# data
cluster <- "clt_6"
sub_df <- dge[[cluster]] %>%
filter(p_val_adj<0.05 & abs(avg_log2FC)>0) %>%
arrange(desc(avg_log2FC))
row.names(sub_df) <- sub_df$Geneid
sub_df <- sub_df[,"avg_log2FC", drop = FALSE]
# row_annot <- rowAnnotation(foo = anno_mark(at = c(1:6,9:18),
# labels = row.names(sub_df)[c(1:6,9:18)],
# labels_gp = gpar(fontsize = 8.5)))
col_fun <- circlize::colorRamp2(c(-4, 0, 4), c("blue", "white", "red"))
# save
set.seed(1024)
heat_clt6 <- Heatmap(matrix = as.matrix(sub_df), name = "Average\nlog2FC",
cluster_columns = FALSE, cluster_rows = FALSE,
show_column_names = FALSE, col = col_fun,
show_row_names = TRUE, row_names_gp = gpar(fontsize = 6.5))#,
#right_annotation = row_annot)
set.seed(1024)
pdf(file = paste(dge_folder, paste0(cluster, "_dge_heatmap.pdf"), sep = "/"),
width = 3, height = 0.1 * nrow(sub_df))
print(heat_clt6)
dev.off()
## png
## 2
# print
set.seed(1024)
print(heat_clt6)
## DGE - heatmaps - clt 7
# data
cluster <- "clt_7"
sub_df <- dge[[cluster]] %>%
filter(p_val_adj<0.05 & abs(avg_log2FC)>0) %>%
arrange(desc(avg_log2FC))
row.names(sub_df) <- sub_df$Geneid
sub_df <- sub_df[,"avg_log2FC", drop = FALSE]
row_annot <- rowAnnotation(foo = anno_mark(at = c(1:10,48:57),
labels = row.names(sub_df)[c(1:10,48:57)],
labels_gp = gpar(fontsize = 8.5)))
col_fun <- circlize::colorRamp2(c(-4, 0, 4), c("blue", "white", "red"))
# save
set.seed(1024)
heat_clt7 <- Heatmap(matrix = as.matrix(sub_df), name = "Average\nlog2FC",
cluster_columns = FALSE, cluster_rows = FALSE,
show_column_names = FALSE, col = col_fun,
show_row_names = FALSE,
right_annotation = row_annot)
set.seed(1024)
pdf(file = paste(dge_folder, paste0(cluster, "_dge_heatmap.pdf"), sep = "/"),
width = 3, height = 0.1 * nrow(sub_df))
print(heat_clt7)
dev.off()
## png
## 2
# print
set.seed(1024)
print(heat_clt7)
To plot markers it was used the integrated Seurat object with the Cre3 and Lox2 samples. The assay used was RNA and the slot data with the counts normalized with the LogNormalize method (read more).
The individual plots of the new markers highlighted through the integrated UMAPs can be found at: results/func_enrich_plots_09_06_21/plots/markers_all.
## Plot new markers
# folder to save plots
plot_mkr_folder <- "../results/func_enrich_plots_09_06_21/plots/markers_all"
if ( ! dir.exists(plot_mkr_folder) ) dir.create(plot_mkr_folder)
# check assay
stopifnot(DefaultAssay(seu) == "RNA")
marker_plots <- list()
for ( mkr in markers_all$Genes ) { # loop over marker genes within a cluster
stopifnot( mkr %in% row.names(seu) ) # stop if gene does not exist in the Seurat obj
marker_plots[[mkr]] <- FeaturePlot(object = seu, features = mkr,
slot = "data", reduction = "umap",
pt.size = 0.001)
ggsave(filename = paste(plot_mkr_folder, paste0(mkr, "_gene_UMAP.pdf"), sep = "/"),
plot = marker_plots[[mkr]], height = 5, width = 5)
}
cowplot::plot_grid(plotlist = marker_plots[1:9], ncol = 3)
cowplot::plot_grid(plotlist = marker_plots[10:18], ncol = 3)
cowplot::plot_grid(plotlist = marker_plots[19:27], ncol = 3)
To plot cluster 8 markers it was used the integrated Seurat object with the Cre3 sample only. The assay used was RNA and the slot data with the counts normalized with the LogNormalize method (read more).
The individual plots of the new markers highlighted through the integrated UMAPs can be found at: results/func_enrich_plots_09_06_21/plots/markers_clt8.
## Plot markers for cluster 8 by Lox2 UMAP
# folder to save plots
plot_mkr_clt8_folder <- "../results/func_enrich_plots_09_06_21/plots/markers_clt8"
if ( ! dir.exists(plot_mkr_clt8_folder) ) dir.create(plot_mkr_clt8_folder)
# get Seurat Lox2 obj
cre3 <- SplitObject(object = seu, split.by = "orig.ident")
cre3 <- cre3$Cre3
# save plots
marker_plot_clt8 <- list()
for ( mkr in markers_clt8 ) {
stopifnot( mkr %in% row.names(cre3) ) # stop if gene does not exist in the Seurat obj
marker_plot_clt8[[mkr]] <- FeaturePlot(object = cre3, features = mkr,
slot = "data", reduction = "umap",
pt.size = 0.001)
ggsave(filename = paste(plot_mkr_clt8_folder, paste0(mkr, "_gene_UMAP.pdf"), sep = "/"),
plot = marker_plot_clt8[[mkr]], height = 5, width = 5)
}
# plot all
cowplot::plot_grid(plotlist = marker_plot_clt8, ncol = 3)
Orthology search was done with the gprofiler2 R package (v.0.2.0) (Kolberg and Raudvere 2020), an interface to the g:Profiler web browser tool. The function gorth() was applied in order get orthologous genes of Plasmodium chabaudi chabaudi (v.PCHAS01; source_organism = "pchabaudi") against Plasmodium falciparum 3D7 (v.ASM276v2; target_organism = "pfalciparum"). The query gene list of P. chabaudi genes consisted in all the genes that are expressed in the Seurat integrated object (n=4697). Genes without correspondence were filtered out (filter_na = TRUE). One *P. chabaudi may have more than one orthologous gene (mthreshold = Inf). This conversion was done at 09/06/2021 using the archived version of the gprofiler2 server - Ensembl 102, Ensembl Genomes 49 (database built on 2020-12-15): https://biit.cs.ut.ee/gprofiler_archive3/e102_eg49_p15/gost.
## Convert orthologous genes between P. chabaudi *vs* P. falciparum 3D7
genes_exp <- row.names(seu)
genes_exp <- gsub(pattern = "PCHAS-", replacement = "PCHAS_", x = genes_exp)
set.seed(1024)
set_base_url("https://biit.cs.ut.ee/gprofiler_archive3/e102_eg49_p15")
orth_pchabaudi_genes <- gorth(query = genes_exp, source_organism = "pchabaudi", target_organism = "pfalciparum")
write.table(file = "../results/func_enrich_plots_09_06_21/tables/orthologous_genes_Pchabaudi_vs_Pfalciparum3D7.tsv",
x = orth_pchabaudi_genes, sep = "\t", row.names = FALSE, quote = FALSE)
Among the P. chabaudi 4697 genes queried, it was only found 4410 hits presented below.
Folders inside the project folder:
results: folder that contains all the results obtained in this analysis;
report: folder that contains the report and code used herein;
data: folder that contains the data used herein;
scripts: folder that contains the scripts used herein;
info: folder that contains some useful information (i.e., papers, etc) used herein;
## R packages and versions used in these analyses
sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] grid stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] ComplexHeatmap_2.2.0 gprofiler2_0.2.0 readxl_1.3.1
## [4] tidyr_1.1.2 ggplot2_3.3.2 Matrix_1.3-2
## [7] gplots_3.1.1 SeuratObject_4.0.0 Seurat_4.0.0
## [10] DT_0.16 dplyr_0.8.5
##
## loaded via a namespace (and not attached):
## [1] Rtsne_0.15 colorspace_1.4-1 rjson_0.2.20
## [4] deldir_0.2-9 ellipsis_0.3.1 ggridges_0.5.3
## [7] circlize_0.4.12 base64enc_0.1-3 fs_1.5.0
## [10] GlobalOptions_0.1.2 clue_0.3-58 spatstat.data_1.7-0
## [13] farver_2.0.3 leiden_0.3.7 listenv_0.8.0
## [16] ggrepel_0.9.1 lubridate_1.7.9 codetools_0.2-18
## [19] splines_3.6.3 knitr_1.30 polyclip_1.10-0
## [22] jsonlite_1.7.2 bsplus_0.1.2 ica_1.0-2
## [25] cluster_2.1.2 png_0.1-7 uwot_0.1.10
## [28] shiny_1.6.0 sctransform_0.3.2 compiler_3.6.3
## [31] httr_1.4.2 assertthat_0.2.1 fastmap_1.1.0
## [34] lazyeval_0.2.2 later_1.1.0.1 htmltools_0.5.1.1
## [37] tools_3.6.3 igraph_1.2.6 gtable_0.3.0
## [40] glue_1.4.2 RANN_2.6.1 reshape2_1.4.4
## [43] Rcpp_1.0.6 spatstat_1.64-1 scattermore_0.7
## [46] cellranger_1.1.0 vctrs_0.3.6 nlme_3.1-152
## [49] crosstalk_1.1.0.1 lmtest_0.9-38 xfun_0.19
## [52] stringr_1.4.0 globals_0.14.0 mime_0.9
## [55] miniUI_0.1.1.1 lifecycle_0.2.0 irlba_2.3.3
## [58] gtools_3.8.2 goftest_1.2-2 future_1.21.0
## [61] klippy_0.0.0.9500 MASS_7.3-54 zoo_1.8-8
## [64] scales_1.1.1 promises_1.1.1 spatstat.utils_2.0-0
## [67] parallel_3.6.3 RColorBrewer_1.1-2 yaml_2.2.1
## [70] reticulate_1.18 pbapply_1.4-3 gridExtra_2.3
## [73] downloadthis_0.2.1 rpart_4.1-15 stringi_1.5.3
## [76] caTools_1.18.0 shape_1.4.5 rlang_0.4.10
## [79] pkgconfig_2.0.3 matrixStats_0.57.0 bitops_1.0-6
## [82] evaluate_0.14 lattice_0.20-44 ROCR_1.0-11
## [85] purrr_0.3.4 tensor_1.5 labeling_0.4.2
## [88] patchwork_1.1.1 htmlwidgets_1.5.3 cowplot_1.1.1
## [91] tidyselect_1.1.0 parallelly_1.22.0 RcppAnnoy_0.0.18
## [94] plyr_1.8.6 magrittr_2.0.1 R6_2.5.0
## [97] generics_0.1.0 pillar_1.4.7 withr_2.4.1
## [100] mgcv_1.8-35 fitdistrplus_1.1-3 RCurl_1.98-1.2
## [103] survival_3.2-11 abind_1.4-5 tibble_3.0.6
## [106] future.apply_1.7.0 crayon_1.3.4 KernSmooth_2.23-20
## [109] plotly_4.9.3 rmarkdown_2.5 GetoptLong_1.0.5
## [112] data.table_1.13.2 digest_0.6.27 xtable_1.8-4
## [115] httpuv_1.5.5 munsell_0.5.0 viridisLite_0.3.0
Allaire, JJ, Yihui Xie, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, Hadley Wickham, Joe Cheng, Winston Chang, and Richard Iannone. 2020. Rmarkdown: Dynamic Documents for R. https://github.com/rstudio/rmarkdown.
Butler, Andrew, Paul Hoffman, Peter Smibert, Efthymia Papalexi, and Rahul Satija. 2018. “Integrating Single-Cell Transcriptomic Data Across Different Conditions, Technologies, and Species.” Nature Biotechnology 36: 411–20. https://doi.org/10.1038/nbt.4096.
Gu, Zuguang, Roland Eils, and Matthias Schlesner. 2016. “Complex Heatmaps Reveal Patterns and Correlations in Multidimensional Genomic Data.” Bioinformatics.
Hao, Yuhan, Stephanie Hao, Erica Andersen-Nissen, William M. Mauck III, Shiwei Zheng, Andrew Butler, Maddie J. Lee, et al. 2020. “Integrated Analysis of Multimodal Single-Cell Data.” bioRxiv. https://doi.org/10.1101/2020.10.12.335331.
Kolberg, Liis, and Uku Raudvere. 2020. Gprofiler2: Interface to the ’G:Profiler’ Toolset. https://CRAN.R-project.org/package=gprofiler2.
R Core Team. 2020. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Satija, Rahul, Jeffrey A Farrell, David Gennert, Alexander F Schier, and Aviv Regev. 2015. “Spatial Reconstruction of Single-Cell Gene Expression Data.” Nature Biotechnology 33: 495–502. https://doi.org/10.1038/nbt.3192.
Stuart, Tim, Andrew Butler, Paul Hoffman, Christoph Hafemeister, Efthymia Papalexi, William M Mauck III, Yuhan Hao, Marlon Stoeckius, Peter Smibert, and Rahul Satija. 2019. “Comprehensive Integration of Single-Cell Data.” Cell 177: 1888–1902. https://doi.org/10.1016/j.cell.2019.05.031.
Xie, Yihui, J. J. Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown.
Xie, Yihui, Christophe Dervieux, and Emily Riederer. 2020. R Markdown Cookbook. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown-cookbook.